面向图像分类的双域特征联合网络

doi:10.16451/j.cnki.issn1003-6059.202504003

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (2456 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对图像分类网络主要依赖空域特征、忽略频域特征的作用,从而导致性能提升受限的问题,文中提出面向图像分类的双域特征联合网络(Two-Domain Feature Association Networks for Image Classification, TANet).首先,设计频域特征提取模块(Frequency Domain Feature Extraction, FDFE),利用快速傅里叶变换有效捕捉图像中的频域细节信息及全局特征,减少关键特征流失,增强图像细节信息的表示能力,提高网络对图像特征的提取能力.再者,设计频域注意力机制模块(Frequency Domain Attention Mechanism, FDAM),考虑多尺度空域特征的同时结合快速傅里叶变换提取频域信息,加强对图像细节的敏感度,提高关键区域贡献度.然后,设计双域特征联合机制(Two-Domain Feature Association Mechanism, TFAM),融合频域特征与空域特征,在保证拥有空域特征的前提下,利用频域特征补充图像细节信息及全局特征,提升特征的表达能力.最后,在残差分支中嵌入FDAM,有效学习输入数据的双域特征,平衡局部信息与全局信息的关注度,增强关键特征的可利用性,提高网络的图像分类能力.在5个公共数据集上的实验表明,TUNet通过联合频域特征可提取图像细节信息及全局特征,减少关键特征流失,加强重要区域的感知能力,提高特征的表达能力,提升网络的图像分类性能.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	袁姮
	于东琪
	高原

关键词 ：图像分类, 频域特征, 空域特征, 快速傅里叶变换, 注意力机制

Abstract：The performance improvement of image classification network is constrained due to the reliance on spatial domain features and the neglect of the role of frequency domain features. To address these issues, two-domain feature association networks for image classification(TANet) are proposed. First, a frequency domain feature extraction(FDFE) module is designed. The Fast Fourier Transform is employed to effectively capture the frequency domain detail information and global features in the image, reduce key feature loss, enhance the representation ability of image detail information, and improve the image features extraction ability of the network. Then, the frequency domain attention mechanism(FDAM) is proposed. The multi-scale spatial domain features are taken into account and combined with the Fast Fourier Transform to extract the frequency domain information. Through FDAM, the sensitivity to image details is enhanced, and the contribution of key regions is improved. Subsequently, a two-domain feature association mechanism(TFAM) is designed to fuse the frequency domain features with the spatial domain features. On the basis of retaining spatial domain features, the frequency domain features are utilized to supplement the image detail information as well as the global features and thereby enhance the expression ability of the features. Finally, FDAM is embedded into the residual branch to learn the two-domain features of the input data more effectively. Thus, the attention between local and global information is balanced, the availability of key features is enhanced, and the capability of the network in image classification is improved. Experiments on five public datasets show that TANet enhances the image classification performance of the network by incorporating frequency domain features, extracting image detail information and global features, reducing key feature loss, enhancing the perception of important regions, and improving the expression of features.

Key words： Image Classification Frequency Domain Feature Spatial Domain Feature Fast Fourier Transform Attention Mechanism

收稿日期: 2025-01-22

ZTFLH:

TP391

基金资助:国家自然科学基金项目(No.61172144)、辽宁省自然科学基金项目(No.20170540426)、辽宁省教育厅重点基金项目(No.LJYL049)资助

通讯作者: 袁姮,博士,副教授.主要研究方向为图像处理、模式识别、人工智能.E-mail:slntuyuanheng@163.com.

作者简介: 于东琪,硕士研究生.主要研究方向为图像处理、模式识别、人工智能.E-mail:2459464905@qq.com.
高原,硕士研究生.主要研究方向为图像处理、模式识别、人工智能.E-mail:1422822508@qq.com.

引用本文:

袁姮, 于东琪, 高原. 面向图像分类的双域特征联合网络[J]. 模式识别与人工智能, 2025, 38(4): 325-340. YUAN Heng, YU Dongqi, GAO Yuan. Two-Domain Feature Association Networks for Image Classification. Pattern Recognition and Artificial Intelligence, 2025, 38(4): 325-340.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202504003 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2025/V38/I4/325

[1] 陈宁,刘凡,董晨炜,等.基于局部对比学习与新类特征生成的小样本图像分类.模式识别与人工智能, 2024, 37(10): 936-946.
(CHEN N, LIU F, DONG C W, et al. Few-Shot Image Classification Based on Local Contrastive Learning and Novel Class Feature Generation. Pattern Recognition and Artificial Intelligence, 2024, 37(10): 936-946.)
[2] ZHANG J L, REN J F, ZHANG Q, et al. Spatial Context-Aware Object-Attentional Network for Multi-label Image Classification. IEEE Transactions on Image Processing, 2023, 32: 3000-3012.
[3] HE T, ZHANG Z, ZHANG H, et al. Bag of Tricks for Image Cla-ssification with Convolutional Neural Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 558-567.
[4] KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 2017, 60(6): 84-90.
[5] SIMONYAN K, ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[C/OL]. [2024-12-15].https://arxiv.org/pdf/1409.1556v2.
[6] SZEGEDY C, LIU W, JIA Y Q, et al. Going Deeper with Convolutions // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 1-9.
[7] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778.
[8] ZAGORUYKO S, KOMODAKIS N.Wide Residual Networks[C/OL]. [2024-12-15].https://arxiv.org/pdf/1605.07146.
[9] XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated Residual Transformations for Deep Neural Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 5987-5995.
[10] GAO S H, CHENG M M, ZHAO K, et al. Res2Net: A New Multi-scale Backbone Architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(2): 652-662.
[11] LI X, WANG W H, HU X L, et al. Selective Kernel Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 510-519.
[12] SHEN Z R, ZHANG M Y, ZHAO H Y, et al. Efficient Attention: Attention with Linear Complexities // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2021: 3530-3538.
[13] LIN H Z, CHENG X, WU X Y, et al. CAT: Cross Attention in Vision Transformer // Proc of the IEEE International Conference on Multimedia and Expo. Washington, USA: IEEE, 2022. DOI: 10.1109/ICME52920.2022.9859720.
[14] SUN Y H, DAI D W, ZHANG Q N, et al. MSCA-Net: Multi-scale Contextual Attention Network for Skin Lesion Segmentation. Pattern Recognition, 2023, 139. DOI: 10.1016/j.patcog.2023.109524.
[15] OUYANG D L, HE S, ZHANG G Z, et al. Efficient Multi-scale Attention Module with Cross-Spatial Learning // Proc of the IEEE International Conference on Acoustics, Speech and Signal Proce-ssing. Washington, USA: IEEE, 2023. DOI: 10.1109/ICASSP49357.2023.10096516.
[16] KHAN S, NASEER M, HAYAT M, et al. Transformers in Vision: A Survey. ACM Computing Surveys, 2022, 54(10s). DOI: 10.1145/3505244.
[17] TOUVRON H, CORD M, DOUZE M, et al. Training Data-Efficient Image Transformers & Distillation through Attention. Proceedings of Machine Learning Research, 2021, 139: 10347-10357.
[18] LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 9992-10002.
[19] WANG C W, WU J S, FANG A Q, et al. An Efficient Frequency Domain Fusion Network of Infrared and Visible Images. Enginee-ring Applications of Artificial Intelligence, 2024, 133. DOI: 10.1016/j.engappai.2024.108013.
[20] HU J, SHEN L, SUN G.Squeeze-and-Excitation Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2018: 7132-7141.
[21] WANG Q L, WU B G, ZHU P F, et al. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2020: 11531-11539.
[22] HOU Q B, ZHOU D Q, FENG J S.Coordinate Attention for Efficient Mobile Network Design // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 13708-13717.
[23] WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional Block Attention Module // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 3-19.
[24] TAN M X, LE Q V.EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Proceedings of Machine Learning Research, 2019, 97: 6105-6114.
[25] HAN K, WANG Y H, TIAN Q, et al. GhostNet: More Features from Cheap Operations // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 1577-1586.
[26] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely Connected Convolutional Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 2261-2269.
[27] ZHOU C L, ZHANG H, ZHOU Z K, et al. QKFormer: Hierarchical Spiking Transformer Using Q-K Attention[C/OL].[2024-12-15]. https://arxiv.org/pdf/2403.16552.
[28] LAN H, WANG X H, SHEN H, et al. Couplformer: Rethinking Vision Transformer with Coupling Attention // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2023: 6464-6473.
[29] SHIN H, CHOI D W.Teacher as a Lenient Expert: Teacher-Agnostic Data-Free Knowledge Distillation. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(13): 14991-14999.
[30] KONSTANTINIDIS D, PAPASTRATIS I, DIMITROPOULOS K, et al. Multi-manifold Attention for Vision Transformers. IEEE Access, 2023, 11: 123433-123444.
[31] MA C X, WU J B, SI C Y, et al. Scaling Supervised Local Lear-ning with Augmented Auxiliary Networks[C/OL].[2024-12-15]. https://arxiv.org/pdf/2402.17318
[32] CHOROMANSKI K, LIKHOSHERSTOV V, DOHAN D, et al. Re-thinking Attention with Performers[C/OL].[2024-12-15]. https://arxiv.org/pdf/2009.14794
[33] HASANPOUR S H, ROUHANI M, FAYYAZ M, et al. Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet[C/OL].[2024-12-15]. https://arxiv.org/pdf/1802.06205.
[34] QIU X R, ZHU R J, CHOU Y H, et al. Gated Attention Coding for Training High-Performance and Efficient Spiking Neural Networks. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(1): 601-610.
[35] WU X D, GAO S Q, ZHANG Z Y, et al. Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 16163-16173.
[36] QIN Z Q, ZHANG P Y, WU F, et al. FcaNet: Frequency Cha-nnel Attention Networks // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 763-772.
[37] GUO M H, LU C Z, LIU Z N, et al. Visual Attention Network. Computational Visual Media, 2023, 9(4): 733-752.